Top-k Nearest Neighbor Search In Uncertain Data Series
نویسندگان
چکیده
Many real applications consume data that is intrinsically uncertain, noisy and error-prone. In this study, we investigate the problem of finding the top-k nearest neighbors in uncertain data series, which occur in several different domains. We formalize the top-k nearest neighbor problem for uncertain data series, and describe a model for uncertain data series that captures both uncertainty and correlation. This distinguishes our approach from prior work that compromises the accuracy of the model by assuming independence of the value distribution at neighboring time-stamps. We introduce theHolistic-PkNN algorithm, which uses novel metric bounds for uncertain series and an efficient refinement strategy to reduce the overall number of required probability estimates. We evaluate our proposal under a variety of settings using a combination of synthetic and 45 real datasets from diverse domains. The results demonstrate the significant advantages of the proposed approach.
منابع مشابه
An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملAn Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification
The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملOn Top-$k$ Weighted SUM Aggregate Nearest and Farthest Neighbors in the $L_1$ Plane
In this paper, we present algorithms for the top-k nearest neighbor searching where the input points are exact and the query point is uncertain under the L1 metric in the plane. The uncertain query point is represented by a discrete probability distribution function, and the goal is to efficiently return the top-k expected nearest neighbors, which have the smallest expected distances to the que...
متن کاملModeling and Querying Data Series and Data Streams with Uncertainty
Many real applications consume data that is intrinsically uncertain and error-prone. An uncertain data series is a series whose point values are uncertain. An uncertain data stream is a data stream whose tuples are existentially uncertain and/or have an uncertain value. Typical sources of uncertainty in data series and data streams include sensor data, data synopses, privacy-preserving transfor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 8 شماره
صفحات -
تاریخ انتشار 2014